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Abstract 

Geometric crossover is a representation-independent definition of crossover based on the 
distance of the search space interpreted as a metric space. It generalizes the traditional 
crossover for binary strings and other important recombination operators for the most fre- 
quently used representations. Using a distance tailored to the problem at hand, the abstract 
definition of crossover can be used to design new problem specific crossovers that embed prob- 
lem knowledge in the search. This paper is motivated by the fact that genotype-phenotype 
mapping can be theoretically interpreted using the concept of quotient space in mathematics. 
In this paper, we study a metric transformation, the quotient metric space, that gives rise 
to the notion of quotient geometric crossover. This turns out to be a very versatile notion. 
We give many example applications of the quotient geometric crossover. 
Keywords: Geometric crossover, metric transformation, quotient metric space, quotient 
geometric crossover. 

1 Introduction 

Geometric crossover and geometric mutation are representation-independent search operators 
that generalize many pre-existing search operators for the major representations used in evolu- 
tionary algorithms, such as binary strings [12], real vectors [23] . permutations [15], permutations 
with repetitions [H] , syntactic trees [13] , and sequences [T7] . They are defined in geometric terms 
using the notions of line segment and ball. These notions and the corresponding genetic oper- 
ators are well-defined once a notion of distance in the search space is defined. Defining search 
operators as functions of the search space is opposite to the standard way [6] in which the search 
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space is seen as a function of the search operators employed. This viewpoint greatly simplifies 
the relationship between search operators and fitness landscape and has allowed us to give simple 
rules- of -thumb to build crossover operators that are likely to perform well. 

Theoretical results of metric spaces can naturally lead to interesting results for geometric 
crossover. In particular, in previous work [16] we have shown that the notion of metric trans- 
formation has great potential for geometric crossover. A metric transformation is an operator 
that constructs new metric spaces from pre-existing metric spaces: it takes one or more metric 
spaces as input and outputs a new metric space. The notion of metric transformation becomes 
extremely interesting when considered together with distances firmly rooted in the syntactic 
structure of the underlying solution representation (e.g., edit distance). In these cases it gives 
rise to a simple and natural interpretation in terms of syntactic transformations. 

In previous work [16] we have extended the geometric framework introducing the notion of 
product crossover associated with the Cartesian product of metric spaces. This is a very impor- 
tant tool that allows one to build new geometric crossovers customized to problems with mixed 
representations by combining pre-existing geometric crossovers in a straightforward way. Using 
the product geometric crossover, we have also shown that traditional crossovers for symbolic 
vectors and blend crossovers for integer and real vectors are geometric crossover. 

In this paper we extend the geometric framework introducing the important notion of quo- 
tient geometric crossover. The metric transformation associated with it is the quotient metric 
space. Quotient space can be regarded as a mathematical definition of phenotype space in the 
evolutionary computation theory. The quotient geometric space has the effect of reducing the 
search space actually searched by geometric crossover, and it introduces problem knowledges in 
the search by using a distance better tailored to the specific solution interpretation. Quotient 
geometric crossover is directly applied to the genotype space, but it has the same effect as the 
crossover performed on phenotype space. 

The paper is organized as follows. In Section[2l we present the geometric framework including 
the notion of geometricity-preserving transformation. In Section El we introduce the notion 
of quotient geometric crossover. In Section 21 we study several useful applications related to 
quotient geometric crossover. In S ect ion T4 . 1 1 and POl we show how groupings [TTJ and graphs can 
be recast and understood more simply in terms of quotient geometric crossover. Here, quotient 
geometric crossover is used to filter out inherent redundancy in the solution representation. 
In Section 14.31 we show how homologous crossover for variable-length sequences [T7] can be 
understood as a quotient geometric crossover. In Section |4~4"] we discuss the usage of the quotient 
geometric crossover for the traveling salesman problem. In Section 14.51 we consider functional 
representation and show how the concept of quotient geometric crossover is connected to the 
search of the functions. Genetic programming, finite states machines, and neural networks are 
shown as examples. We explain that quotient geometric crossover can be used to understand 
how crossover and neutral code interact in Section [4.61 In Section EJ we give conclusions. 
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2 Geometric Framework 



2.1 Geometric Preliminaries 

In the following we give necessary preliminary geometric definitions and extend those introduced 
in [12\ [T3] . The following definitions are taken from [3] . 

The terms distance and metric denote any real- valued function that conforms to the axioms 
of identity, symmetry, and triangular inequality. In a metric space (S, d) a line segment (or 
closed interval) is the set of the form [a;;y]rf = {z & S \ d(x,z) + d(z,y) = d(x,y)} where 
x,y £ S are called extremes of the segment. Metric segment generalizes the familiar notions of 
segment in the Euclidean space to any metric space through distance redefinition. Notice that 
a metric segment does not coincide to a shortest path connecting its extremes (geodesic) as in 
an Euclidean space. In general, there may be more than one geodesic connecting two extremes; 
the metric segment is the union of all geodesies. 

We assign a structure to the solution set S by endowing it with a notion of distance d. 
M = (S, d) is therefore a solution space and (M, f) is the corresponding fitness landscape, 
where / is the fitness function over S. 

2.2 Definition of Geometric Crossover 

The following definitions are representation-independent therefore applicable to any representa- 
tion. 

Definition 1 (Image set). The image set Im[OP] of a genetic operator OP is the set of all 

possible offspring produced by OP. 

Definition 2 (Geometric crossover). A binary operator GX is a geometric crossover under the 
metric d if all offspring are in the segment between its parents x and y, i.e., 

Im[GX(x,y)} C [x;y] d . 

A number of general properties for geometric crossover and geometric mutation have been 
derived in [12] ■ Traditional crossover is geometric under Hamming distance. Among crossovers 
for permutations, PMX, a well-known crossover for permutations, is geometric under swap 
distance. Also, we found that cycle crossover, another traditional crossover for permutations, is 
geometric under swap distance and under Hamming distance. 

2.3 Formal Evolutionary Algorithm and Problem Knowledge 

Geometric operators are defined as functions of the distance associated with the search space. 
However, the search space does not come with the problem itself. The problem consists only of 
a fitness function to optimize, that defines what a solution is and how to evaluate it, but it does 
not give any structure on the solution set. The act of putting a structure over the solution set 
is part of the search algorithm design and it is a designer's choice. 

A fitness landscape is the fitness function plus a structure over the solution space. So, for 
each problem, there is one fitness function but as many fitness landscapes as the number of 
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possible different structures over the solution set. In principle, the designer could choose the 
structure to assign to the solution set completely independently from the problem at hand. 
However, because the search operators are defined over such a structure, doing so would make 
them decoupled from the problem at hand, hence turning the search into something very close 
to random search. 

In order to avoid this one can exploit problem knowledge in the search. This can be achieved 
by carefully designing the connectivity structure of the fitness landscape. For example, one can 
study the objective function of the problem and select a neighborhood structure that couples 
the distance between solutions and their fitness values. Once this is done problem knowledge 
can be exploited by search operators to perform better than random search, even if the search 
operators are problem-independent (as is the case of geometric crossover and geometric muta- 
tion). Indeed, the fitness landscape is a knowledge interface between the problem at hand and 
a formal, problem-independent search algorithm. 

Under which conditions is a landscape well-searchable by geometric operators? As a rule of 
thumb, geometric mutation and geometric crossover work well on landscapes where the closer 
pairs of solutions are, the more correlated their fitness values are. Of course this is no surprise: 
the importance of landscape smoothness has been advocated in many different context and has 
been confirmed in uncountable empirical studies with many neighborhood search meta-heuristics 
|20| . We operate according to the following rule-of-thumbs: 

Rule- of -thumb 1: if we have a good distance for the problem at hand, then we have a good 
geometric mutation and a good geometric crossover. 

Rule- of -thumb 2: a good distance for the problem at hand is a distance that makes the landscape 
"smooth." 

2.4 Geometricity-Preserving Transformation 

In previous work we have proven that a number of important pre-existing recombination opera- 
tors for the most frequently used representations are geometric crossovers. We have also applied 
the abstract definition of geometric crossover to distances firmly rooted in a specific solution 
representation and designed brand-new crossovers. An appealing way to build new geometric 
crossovers is starting from recombination operators that are known to be geometric and deriving 
new geometric crossovers by geometricity-preserving transformations/combinations that when 
applied to geometric crossovers, return geometric crossovers. 

The definition of geometric crossover is based on the notion of metric. Therefore, a natural 
starting point to seek geometricity-preserving transformations is to consider transformations of 
the underlying metrics that are known to return metric spaces and study how the geometric 
crossover associated with the transformed metric space relates with the geometric crossover 
associated with the original metric space. 

There are a number of metric space transformations [3j [21] that are potentially of interest 
for geometric crossover: sub-metric space, product space, quotient metric space, gluing metric 
space, combinatorial transformation, non-negative combinations of metric spaces, Hausdorff 
transformation, and concave transformation. 

Let us consider the geometric crossover X associated with the original metric space M, and 
the geometric crossover X' associated with the transformed metric space M' = mt(M) where 
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mt 

M - W 




Figure 1: Commutative diagram linking metric and crossover transformations. 

mt is the metric transformation. The functional relationship among metric spaces and geomet- 
ric crossovers can be nicely expressed through a commutative diagram (Figure [T|) . gx means 
application of the formal definition of geometric crossover and gt means induced geometricity- 
preserving crossover transformation associated with the metric transformation mt. This dia- 
gram becomes remarkably interesting when the metric transformation mt is associated with an 
induced geometricity-preserving crossover transformation gt that has a simple interpretation in 
terms of syntactic manipulation. This indeed allows one to get new geometric crossovers starting 
from recombination operators that are known to be geometric by simple geometricity-preserving 
syntax manipulation. 

We study those metric-preserving transformations which induced geometricity-preserving 
transformations have a simple and natural interpretation on the solution representation. 

3 Quotient Geometric Crossover 
3.1 Quotient Metric Space 

Let (S, d) be a metric space and ~ be an equivalence relation on S. Consider the quotient space 
S/ ~. Now we will give a metric on S/ ~ induced by the original metric d on S. 

Definition 3 (Induced distance measure). 
For x,y G S/ ~, 

d~(x,y) := inf _d(x,y). 

x£x,y£y 

Then, the following theorem holds pQ. 

Theorem 1. If the equivalence relation arises from an isometry subgroup^, gL, is a metric on 
5/ ~. 

This metric space (S/ ~,cL) is called quotient metric space. Later we will directly prove 
that d~ becomes a metric instead of showing that its related equivalence relation ~ comes from 
an isometry subgroup. 

In a metric space (S, d) a quotient line segment is the set of the form [x; y\d^ = {z € 
S | d~(x, z) + d~(z,y) = d^(x,y),z £ S/ ~} where x,y G S/ ~. Now we can define quotient 
geometric crossover. 

^^For details, see [l]. 
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Definition 4 (Quotient geometric crossover). 

A binary operator GX q is a quotient geometric crossover under the metric d and the equivalence 
relation ~ if all offspring are in the quotient line segment between its parents x and y, i.e., 
GX q (x,y) C [x;y] d „. 

3.2 Genotype-Phenotype Mapping 

The notion of quotient geometric crossover is important because it lies at the heart of the relation 
between geometric crossover and genotype-phenotype mapping as we illustrate in the following. 

Genotype means solution representation: some structure that can be stored in a computer 
and manipulated. Phenotype means solution itself without any reference to how it is represented. 
Sometimes it is possible to have a one-to-one mapping between genotypes and phenotypes, so 
the distinction between genotype and phenotype becomes purely formal. However in many 
interesting cases phenotypes cannot be represented uniquely by genotypes. So the same pheno- 
type is represented by more than one genotypes. In this case we say that we have a redundant 
representation. For example, to represent a graph we need to label its nodes and then we can 
represent it using its adjacency matrix. This representation is redundant: the same graph can 
be represented with more than one adjacency matrix by relabeling its nodes. 

There are quite a few problems in that it is hard to represent one phenotype by just one 
genotype using traditional representations. Roughly speaking, redundant representation leads 
to severe loss of search power in genetic algorithms, in particular, with respect to traditional 
crossovers [2]. To alleviate the problems caused by redundant representation, a number of 
methods such as adaptive crossover have been proposed [H QUI ESI Among them, a tech- 
nique called normalization^ is representative. It transforms the genotype of a parent to another 
genotype to be consistent with the other parent so that the genotype contexts of the parents 
are as similar as possible in crossover. There have been a number of successful studies using 
normalization. An extensive survey about normalization is appeared in [2]. 

While previous crossovers are usually defined on the subset of genotypes for normalization, 
quotient geometric crossover is formally defined on the whole set of genotypes but actually has 
the normalization effect. 

Although many of studies about normalization did not use the concept of distance, once a 
distance da on the genotypes G is defined, we can formally redefine the normalization p' 2 of the 
second parent p2 to the first p% as follows: 

p 2 : = argmind G (pi,s), 

where w(s) is the set of all the genotypes with the same phenotype as the genotype s. The use 
of distance to define normalization is important because it generalizes and makes rigorous the 
notion of normalization for any solution representation. 

Now we formally present the general relation between geometric crossover and genotype- 
phenotype mapping. The concept of normalization defined by distance is closely related to the 
quotient geometric crossover. Let us consider genotype-phenotype mappings q : G — > P that are 

2 The term of normalization is firstly appeared in [JJ. However, it is based on the adaptive crossovers proposed in [101 118] ■ 
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quotient map 

genotype metric space q phenotype metric space 

(G, d G ) -(P,4>) 



t 

geometric crossover 

on P = G/~ 
impossible to implement 

Figure 2: Functional relationship among genotype & phenotype metric spaces and geometric 
crossovers on them 

not injective (redundant representation). The mapping q induces a natural equivalence relation 
~ on the set of genotypes: genotypes with the same phenotype belong to the same class. Given 
a distance do on genotypes G, the quotient with the relation ~ produces a distance dp on the 
phenotypes P: P = Gj ~ and d P (x,y) = M xe x, y ey d G (x, y). 

By applying the formal definition of geometric crossover to the metric spaces (G, do) and 
(P, dp), we obtain the geometric crossovers Xq and Xp, respectively. Xq searches the space of 
genotypes and Xp searches the space of phenotypes. Searching the space of phenotypes has a 
number of advantages: (i) it is smaller than the space of genotypes, hence quicker to search (ii) 
the phenotypic distance is better tailored to the underlying problem, hence the corresponding 
geometric crossover works better (iii) the space of phenotypes has different geometric charac- 
teristics from the genotype space. This can be used to remove unwanted bias from geometric 
crossover. 

However, the crossover Xp cannot be directly used itself because it recombines phenotypes 
that are objects that cannot be directly represented. The quotient geometric crossover allows us 
to search the space of phenotypes with the crossover Xp indirectly by manipulating the genotypes 
G. This is possible because for the commutative diagram there exists an induced geometricity- 
preserving transformation gt of the genotypic crossover Xq that allows us to use the genotypic 
representation to implement a geometric crossover in the space of phenotypes gt{Xc) without 
making explicit use of phenotypes (see Figure [2]). The type of the transformation gt depends 
on the type of the equivalence relation ~ used in the quotient of the underlying metric space 
that in turns depends on the underlying syntax of the solution representation. It may happen 
that the induced geometricity-preserving transformation may be difficult to implement and/or 
computationally intractable. In these cases, it may not be feasible using an exact equivalent of 
the phenotypic geometric crossover, but an approximation may be preferable and still retaining 
most of the advantages of the exact equivalent. 

In the following section we consider a number of equivalence classes for the quotient operation 
and its related induced genotypic crossover transformation. 




t 
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4 Applications 



Quotient geometric crossover has various applications. Although some of these methods have 
already been used independently, we unify the methods, which look quite different, under the 
concept of quotient geometric crossover. 

4.1 Groupings 

Grouping problems [5] are commonly concerned with partitioning given item set into mutually 
disjoint subsets. Examples belonging to this class of problems are multiway graph partitioning, 
graph coloring, bin packing, and so on. Grouping representation is also used to solve the 
joint replenishment problem, which is a well-known problem appeared in the field of industrial 
engineering [TT5]. In this class of problems, the normalization decreased the problem difficulty 
and led to notable improvement in performance. 

Most normalization studies for grouping problems were focused on the fc-way partitioning 
problem. In the problem, the fc-ary representation, in which k subsets are represented by 
the integers from to k — 1, has been generally used. In this case, a phenotype (a A;- way 
partition) is represented by k\ different genotypes. In the problem, a normalization method 
was used in [7j. Other studies for the A;- way partitioning problem used the same technique 
[21 Ej - In sense that normalization pursues the minimization of genotype inconsistency among 
chromosomes, in previous work [9], we proposed an optimal, efficient normalization method for 
grouping problems and a distance measure, the labeling-independent distance, that eliminates 
this dependency completely. 

Let a, b G U = {1, 2, . . . , k} n be /c-ary encodings (fixed-length vectors on a /c-ary alphabet) 
and H be the Hamming distance in U. We define a and b to be in relation ~ if there exist a 
and a' in such that a CT = b CT / where is the set of all permutations of length k and a a is 
a permuted encoding of o by a permutation u, i.e., the i th element ctj of a is transformed into 
a{ai). Then, the relation ~ is an equivalence relation (see [9]). 

We define the labeling-independent distance LI on U / ~ as follows: 



(U/ ~, LI) is a metric space, i.e., the labeling-independent distance LI is a metric on U / ~ (see 



We designed a new crossover based on the labeling-independent metric in previous work 

Definition 5 (Labeling-independent crossover). Normalize the second parent to the first under 
the Hamming distance H . Do the normal crossover using the first parent and the normalized 
second parent. 

In fact, this crossover is the quotient geometric crossover since its offspring are exactly on 
quotient line segment. We proved it in [Tl] though we did not represented with the notion of 
quotient geometric crossover. In sum, we have: 



LI{a,b) : 



min H{a <T ,b a i 
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• Genotypes G: labeled partitions represented as vectors of symbols 

• Phenotypes P: unlabeled partitions 

• Equivalence relation ~: labeled partitions with the same partition structure 

• Distance on genotypes da- Hamming distance 

• Distance on phenotypes dp: labeling-independent distance 

• Crossover on genotypes X<j: traditional crossover for vectors 

• Crossover of phenotypes Xp: label normalization before traditional crossover 

• Induced crossover transformation gt: label normalization 

The benefit of understanding normalization for grouping problems in terms of quotient ge- 
ometric crossover is the possibility of understanding the benefit of normalization in terms of 
landscape analysis. We have done this in previous work 

4.2 Graphs 

In this subsection, we consider any problem naturally defined over a graph in which the fitness of 
the solution does not depend on the labels on the nodes but only on the structural relationship, 
i.e., edge between nodes. 

Formally, let A £ 9JT n be the adjacency matrix of a labeled graph using labels of n nodes and 
let P be an n x n permutation matrbH. Then the matrix PA means the labeled graph obtained 
by relabeling A according to the permutation represented by P. The fitness / : 5D? n — > R satisfies 
that for every A € 5DT n and every permutation matrix P, f(A) = f{PA). 

Let (SPt n , H) be a metric space on the labeled graphs under the Hamming distance H. Notice 
that this metric is labeling-dependent. In particular, H(A, PA) may not be zero although A 
and PA represent the same structure. If A is equal to PA' for some permutation matrix P, we 
define A and A' to be in relation ~, i.e., A ~ A'. Then, the relation ~ is an equivalence relation. 

An unlabeled graph q is the equivalence class of all its labeled graphs, i.e., 
q(A) = {PA | P is a permutation matrix}, unlabeled- graph space 9Jt n / ~ is the set of all equiv- 
alence classes partitioning the set Wl n . 

3 Permutation matrix is a (0, l)-matrix with exactly one 1 in every row and column. 



9 



We define induced distance measure LI on M n j ~ as follows: for each q,q' € 9Jl n / ~, 

LI(s, &')■= A mm H(A,A'). 

Then, (Wl n / ~,LI) is a metric space, i.e., LI is a metric on Wl n / ~. It shows that the metric 
space (9Jt n ,i?) induces a quotient metric space (9JT„/ 

Definition 6 (Labeling-independent crossover). Do i/ie graph matching of the second parent P2 
to the first pi under the Hamming distance H, i.e., 

p' 2 := argmini7(pi, A). 

Aeg(p 2 ) 

Do the normal crossover using the first parent p\ and the graph-matched second parent p' 2 ■ 

The following theorem shows that the labeled-graph geometric crossover for (pJl n ,H) induces 
the unlabeled-graph geometric crossover for (Win/ ~,LI). 

Theorem 2. The labeling-independent crossover is geometric under the metric LI . 

The labeling-independent crossover is defined over unlabeled graphs 97T n / ~. This space is 
much smaller than labeled graphs 9Jt n . More precisely, \Wl n / ~ | = |9Jt n |/n!. This means that 
the more the labels are, the smaller the unlabeled-graph space is compared with the labeled- 
graph space. Smaller space means better performance given the same amount of evaluations. 

The previous theorem tells how to guide the implementation using graph matching for specific 
geometric crossovers. To implement the geometric crossover over unlabeled graphs, we need to 
use labeled graphs. The labeling results are necessary to represent and handle the solution, even 
if in fact it is only an auxiliary function and can be considered not being part of the problem 
to solve. Graph matching before crossover allows to implement the geometric crossover on the 
unlabeled-graph space using the corresponding geometric crossover over the auxiliary space of 
the labeled graph after graph matching. In sum, we have: 
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• Genotypes G: labeled graphs with the same number of nodes represented as adjacency 
matrices of the same size 

• Phenotypes P: unlabeled graphs 

• Equivalence relation ~: adjacency matrices with the same underlying unlabeled graph 
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• Distance on genotypes oIq: Hamming distance between adjacency matrices 

• Distance on phenotypes dp: labeling-independent distance between unlabeled graphs. This 
equals the edge edit distance. 

• Crossover on genotypes Xq: traditional crossover on adjacency matrices seen as vectors 

• Crossover of phenotypes Xp: graph matching before traditional crossover on adjacency 
matrices 

• Induced crossover transformation gt: graph matching 

The benefit of applying the quotient geometric crossover on graphs is the design of a crossover 
better tailored to graphs. The notion of graph matching before crossover arises directly from the 
definition of quotient geometric crossover. Graphs are very important because they are ubiq- 
uitous. In future work we will test this crossover on some applications. Graphs and groupings 
can be seen as particular cases of labeled structures in which the fitness of a solution depends 
only on the structure and not on the specific labeling. In future work we will study the class of 
labeled structures in combination with quotient geometric crossover. 

4.3 Sequences 

In this subsection we recast alignment before recombination in variable-length sequences as a 
consequence of quotient geometric crossover. In previous work [T7] we have applied geometric 
crossover to variable-length sequences. The distance for variable-length sequences we used there 
is the edit distance LL@: the minimum number of insertion, deletion, and replacement of single 
character to transform one sequence into the other. The geometric crossover associated with 
this distance is the homologous geometric crossover, two sequences are aligned optimally before 
recombination. Alignment here means allowing parent sequences to be stretched to match better 
with each other. Formally stretching sequences means interleaving '-' anywhere and in any 
number in the sequences to create two stretched sequences of the same length that have minimal 
Hamming distance. For example, if we want to recombine agcacaca and acacacta, we need 
to align them optimally first: agcacac-a and a-cacacta. Notice that the Hamming distance 
between the aligned sequences is less than the Hamming distance between the non-aligned 
sequences. 

After the optimal alignment, one does the normal crossover and produces a new stretched 
sequence. The offspring is obtained by removing so by unstretching the sequence. How does 
quotient geometric crossover fit in here? We can define a relation ~ on stretched sequences: 
each stretched sequence belongs to the class of its unstretched version. Then, we can easily check 
that the relation ~ is an equivalence relation. Let (s) be the set of all stretched sequences of 
sequence s. We define the induced distance measure cL. Let si, S2 be variable-length sequences. 
If H is the Hamming distance for stretched sequences, 

d~(si,s 2 ) := mm #(si,4)- 

s' 1 E(si),s' 2 e(s 2 ) 

4 The notation LD comes from Levenshtein distance that is another name of edit distance. 
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Then, by the definition of edit distance, cL is equal to LD. Hence cL is a metric on variable- 
length sequences. 

Theorem 3. Homologous crossover is geometric under the edit distance \17f . 
In summary, we have the following. 
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f If sequences have different length, their Hamming distance is applied after aligning the sequences leftmost, and the tail of 
the longer sequence is considered different from the missing tail of the shorter sequence. 

This idea can be extended to any stretchable structure, e.g., stretchable graphs. 

• Genotypes G: variable-length stretched sequences 

• Phenotypes P: variable-length (unstretched) sequences 

• Equivalence relation ~: stretched sequences with the same unstretched sequence 

• Distance on genotypes da- If the two stretched sequences have different length, add as 
many '-' as necessary at the right end of the shorter sequence to make it become equal in 
length to the longer sequence. Their genotypic distance is then their Hamming distance. 

• Distance on phenotypes dp: edit distance between sequences 

• Crossover on genotypes Xq: traditional crossover on stretched sequences. If the two 
stretched sequences have different length, add as many '-' as necessary at the right end of 
the shorter sequence to make it become equal in length to the longer sequence. 

• Crossover of phenotypes Xp: homologous crossover for sequences 

• Induced crossover transformation gt: optimal alignment 

Phenotypes are variable-length sequences that are directly represent able. So in this case the 
quotient geometric crossover is not used to search a non-directly representable space (pheno- 
types) through an auxiliary directly representable space (genotypes). The benefit of applying the 
quotient geometric crossover on variable-length sequences is that the homologous crossover over 
sequences Xp is naturally understood as a transformation gt of the geometric crossover Xq over 
stretched sequences G rather than a crossover acting directly on sequences P. This is because 
the notion of optimal alignment is inherently defined on stretched sequences and not on simple 
sequences. In previous work [T7] we have tested the homologous crossover on the protein motif 
discovery problem. In future work we want to study how the optimal alignment transformation 
affects the fitness landscape associated with geometric crossover with and without alignment. 
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4.4 Traveling Salesman Problem 

In previous work jl3j we have applied geometric crossover to traveling salesman problem (TSP). 
Solutions are tours of cities, or circular permutations. A good neighborhood structure for TSP 
is the one based on the 2-opt move. This move simply reverses the order of the cities of a 
contiguous subtour. This move induces a graphic distance between tours: the minimum number 
of reversals to transform one tour into the other. The geometric crossover associated with this 
distance belongs to the family of sorting crossovers: it picks offspring on the minimum sorting 
trajectory between parent circular permutations sorted by reversals. Tours of cities or circular 
permutations cannot be represented directly. They are represented with simple permutations. 
Gluing head and tail of the permutation obtains a circular permutation. However each circular 
permutation is represented by more than one simple permutation. How does quotient geometric 
crossover fit in here? We can define an equivalence relation on the simple permutations: each 
simple permutation belongs to the class of its associated circular permutation. So we have: 
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• Genotypes G: permutations 

• Phenotypes P: circular permutations (tours) 

• Equivalence relation ~: permutations identifying the same circular permutation 

• Distance on genotypes do- reversal distance between permutations 

• Distance on phenotypes dp: reversal distance between circular permutations 

• Crossover on genotypes Xq: based on sorting by reversals for permutations 

• Crossover of phenotypes Xp: based on sorting by reversals for circular permutations im- 
plemented using simple permutations: circular shift to match as much as possible the two 
simple permutations before sorting crossover 

• Induced crossover transformation gt: circular shift before sorting by reversal crossover 

This example of quotient geometric crossover illustrates how to obtain a geometric crossover 
for a transformed representation (circular permutation) starting from a geometric crossover for 
the original representation (simple permutation). So in this case quotient geometric crossover is 
used as a tool to build a new crossover for a derivative representation from a known geometric 
crossover for the original representation. From previous work we know that the sorting by 
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reversal crossover for simple permutations is an excellent crossover for TSP. In future work we 
want to test the sorting by reversal crossover for circular permutations. Since they are a direct 
representation of city tours we expect it to perform even better. 

4.5 Functions 

Here we consider functional representations: any representation that encodes a function. Exam- 
ples of this type of representation are genetic programming (GP) trees, finite state automata, 
and neural networks. We can define an equivalence relation on the solution space: all solutions 
representing the same function. So we have: 



original 


original metric 


metric 


for the specific representation 


quotient 


representation-independent metric 


metric 


among representable functions 


original 


geometric crossover 


geometric crossover 


for the specific representation 


quotient 


geometric crossover 


geometric crossover 


in the function space 



4.5.1 Genetic Programming 

We can define an equivalence relation: all symbolic expressions that represent the same function. 
We can also define a less strong equivalence relation: consider as equivalent those syntactic trees 
that differ in the order of the operands in nodes with commutative operations. For example, the 
multiplication operation '*' is commutative and two different trees represent the same function. 



original 
metric 


structural Hamming distance 
between rooted ordered trees j!4l 


quotient 
metric 


structural Hamming distance 
between rooted unordered trees 
(only for commutative nodes) 


original 
geometric crossover 


homologous crossover 
for GP trees 


quotient 
geometric crossover 


homologous crossover for GP trees 
with reordering of commutative 
subtrees to have minimum 
structural Hamming distance 



This quotient geometric crossover is based on the less strong equivalence relation. So it is not 
fully semantical. However already this quotient geometric crossover cannot be implemented 
efficiently because the complexity to compute the structural Hamming distance between rooted 
unordered trees grows exponentially with the number of nodes in the trees. 

• Genotypes G: parse trees that is a compact (shorter than extensive form), redundant (the 
same function can be represented by more than one parse tree) and biased (some functions 
have more associated parse trees than other functions) representation of functions. 
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• Phenotypes P: computed functions. A generic function can be directly represented in an 
extensive form as a vector in which for every combinations of the input values there is 
a cell that contains the output of the function for those values. We call this vector the 
output vector representation of the function. Clearly this direct representation in practice 
is not used because it is too long. 

• Equivalence relation ~: parse trees that correspond to the same function or equivalently 
with the same output vector. 

• Distance on genotypes do- structural Hamming distance between parse trees 

• Distance on phenotypes dp: (weighted) Hamming distance on output vectors 

• Crossover on genotypes Xq: homologous crossover for parse trees 

• Crossover of phenotypes Xp: traditional crossover on the output vectors of the functions 

• Induced crossover transformation gt: expand/reduce/ change syntactic trees before crossover 
without changing the underlying computed functions such as they have minimal structural 
Hamming distance 

The benefit of applying quotient geometric crossover to parse trees is to search the space 
of the functions represented by the parse trees rather than the space of parse trees. This is 
done indirectly by manipulating parse trees. Even if in principle a function can be represented 
directly using its output vector representation, so making not strictly necessary to recur to an 
auxiliary genotypic representation and to the quotient geometric crossover to search this space, 
such direct representation is simply too long for any practical purpose, and a concise genotypic 
representation is needed. 

The implementation of the phenotypic geometric crossover using the transformation gt on 
the genotypic crossover Xq presents a problem: it is simply not possible to compute efficiently 
the transformation gt because one needs to compute the smallest structural Hamming distance 
between all possible transformations of the syntactic trees that keep invariant their underlying 
functions. We could relax the problem and consider a weaker equivalence relation in which two 
parse trees are equivalent if exchanging subtrees of nodes with commutative operations (syntactic 
transformation that keeps the computed function invariant) they become equal. In this case dp 
becomes the distance between rooted (partially) unordered trees. The computational cost of this 
distance grows exponentially with the number of commutative nodes in the syntactic trees. This 
could be still hard to compute and so could the associated geometric crossover Xp. However 
there are quick approximated algorithms to compute this distance. In future work we will try 
this crossover. 

4.5.2 Finite States Machines 

Finite state machines can represent discrete functions or classifiers: given in input any sequence, 
they return the class of this sequence. They are represented as labeled rooted directed graphs 
or equivalently with a transition matrix. We can define an equivalence relation: all the FSMs 
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that represent the same classifier. We can also define a less strong equivalence relation: all the 
unlabeled FSMs that represent the same classifier. In fact, as for graph partitioning the labels 
are assigned completely arbitrarily. 



original 


Hamming distance 


metric 


on transition matrix 


quotient 


Hamming distance 


metric 


on normalized transition matrix 


original 


traditional crossover 


geometric crossover 




quotient 


normalization before recombination 


geometric crossover 


of the transition matrix 



• Genotypes G: transition matrices 

• Phenotypes P: classification functions 

• Equivalence relation ~: transition matrices that give rise to the same classifier (same 
output vectors defined as for the parse trees) 

• Distance on genotypes do- Hamming distance between transition matrices 

• Distance on phenotypes dp: minimum Hamming distance between unlabeled transition 
matrices that equals (weighted) Hamming distance on output vectors 

• Crossover on genotypes Xq: traditional crossover on transition matrices 

• Crossover of phenotypes Xp: crossover of the underlying classifiers (traditional crossover 
on the output vectors) 

• Induced crossover transformation gt: put both FSMs in a normal form (for example using 
their lexicographic order) before recombination of their transition matrices with traditional 
crossover. This is a quick heuristic that approximates the phenotypic crossover. 

The benefit of the quotient geometric crossover is to be able to search the space of classifiers 
using a concise representation. 

4.5.3 Neural Networks 

Neural networks can be represented by real matrices of the connection weights. We can define 
an equivalence relation: all the matrices that give rise to the same input-output mapping. We 
can also define a less strong equivalence relation: all the matrices that when reordered become 
the same. 



16 



on cinal 


A/Tan Via ft an rli^tanrp 


vn p"I"ti c 

111" tl 1L. 


OT1 TT1 £l"i"T*l PPC 


miot.iPTit, 


1VT a n h a tt a n H i st a n cp 


metric 


between unlabeled matrices 


original 


traditional box crossover 


geometric crossover 


for real vectors 


quotient 
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• Genotypes G: weights matrices 

• Phenotypes P: continuous functions 

• Equivalence relation ~: weights matrices are equivalent when giving rise to the same 
function (the same output vectors defined as for the parse trees, in this case the vector is 
infinite-dimensional) 

• Distance on genotypes do- Manhattan distance between weights matrices 

• Distance on phenotypes dp: weight-label-independent Manhattan distance between weights 
matrices equals properly weighted Manhattan distance on output vectors (distance must 
be a finite number) 

• Crossover on genotypes Xq\ box recombination of weights matrices 

• Crossover of phenotypes Xp: box recombination continuous functions 

• Induced crossover transformation gt: normalization on weight-label before box recombi- 
nation of the weights matrices 

The benefit of the quotient geometric crossover is to be able to search with its geometric 
crossover the space of continuous function indirectly using a concise representation (the weights 
matrices representation). The geometric crossover defined over continuous function cannot be 
implemented directly on the phenotype space (space of functions) because it would need to 
recombine infinite-dimensional vectors. 

4.6 Neutrality 

The role of neutrality is little understood. Notice that neutrality is a synonym of redundancy. As 
a rule of thumb one would like to filter out redundancy as in normalization for structural problem 
to improve performance. However neutrality may have some beneficial aspect on variable-length 
representation. In fact it can be used to have a self-adaptive mutation rate at a phenotypic 
level: imaging you have a constant number of mutations at a genotype level. If the informative 
part, the one used to get the phenotype, is small as compared with the non-informative one, 
mutation at genotype level have a small chance to affect the phenotype. So the same mutation 
rate at genotype level can correspond to a smaller or equivalent mutation at a phenotype level 
depending on the amount of neutral code in the genotype. Since the mutation itself inserts 
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or deletes neutral code, this combined with selection develops a self-adaptive mechanism that 
selects genotypes with the right amount of neutral code to be more evolvable. Neutrality is 
widespread in nature, so studying neutrality is important. Quotient geometric crossover can 
be used to understand how crossover and neutrality interact. In fact the induced geometricity- 
preserving transformation tells what trick to use to remove redundancy for crossover but still 
keep it there for mutation to obtain the self-adaptive mutation rate trick, so to take advantage 
of both genotype and phenotype spaces. 

• Genotypes G: sequence with neutral code (part of the sequence that if removed would not 
affect the phenotype) 

• Phenotypes P: sequence without neutral code. There is a one-to-one mapping between 
these sequences and the phenotypes. So this is a direct representation of the phenotypes, 
rather than the phenotype itself. 

• Equivalence relation ~: two sequences with neutral code are equivalent if when the neutral 
code is removed they become the same phenotypic sequence. 

• Distance on genotypes do- edit distance on sequences with neutral code 

• Distance on phenotypes dp: edit distance on sequences without neutral code 

• Crossover on genotypes Xq: homologous crossover for sequences 

• Crossover of phenotypes Xp: homologous crossover for sequences 

• Induced crossover transformation gt: identity transformation 

The benefit of the quotient geometric crossover is to show how crossover and neutral code 
interact. We have seen that neutrality may be beneficial in terms of adaptive mutation rate. 
Since the induced crossover transformation is the identity transformation, this means that the 
same crossover that searches the genotypes space can be understood as a crossover searching the 
phenotype space indirectly using the genotypes. In other words, the neutral code is completely 
transparent to the search done by crossover and it does not affect its search or performance. So, 
neutral code retains the advantage of an adaptive mutation rate together with being transparent 
to the action of crossover. 

5 Concluding Remarks 

In this paper we have extended the geometric framework introducing the notion of quotient geo- 
metric crossover. This could be clearly understood using the concept of geometricity-preserving 
transformation. Quotient geometric crossover is a very general and versatile tool. We have 
given a number of interesting examples as its applications. As shown in applications, quotient 
geometric crossover is not only theoretically significant but also has a practical effect of making 
search more effective by reducing the search space or removing the inherent bias. More theo- 
retical analysis will be appeared in the extended full paper, and more detailed applications for 
each case are left for future study. 
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